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Abstract 

The proposal of this paper is to provide a simple angular random walk model to 
build up polypeptide structures, which encompass properties of dihedral angles of 
folded proteins. From this model, structures will be built with lengths ranging from 
125 up to 400 amino acids for the different fractions of secondary structure motifs, 
which dihedral angles were randomly chosen according to narrow Gaussian prob- 
ability distributions. In order to measure the fractal dimension of proteins three 
different cases were analyzed. The first contained a-helix structures only, the sec- 
ond /3-strands structures and the third a mix of a-helices and /3-sheets. The behavior 
of proteins with a-helix motifs are more compacted than in other situations. The 
findings herein indicate that this model describes some structural properties of a 
protein and suggest that randomness is an essential ingredient but proteins are 
driven by narrow angular Gaussian probability distributions and not by random- 
walk processes. 
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The manner in which a protein folds from a random coil into a unique native 
state in a relatively short time is one of the fundamental puzzles of molec- 
ular biophysics. It is well accepted that a unique native three-dimensional 
structure, characteristic of each protein and determined by the sequence of its 
amino-acids sequence, dictates protein functions. The folding process should 
involve a very complex molecular recognition phenomenon depending on the 
interplay of many relatively weak non-bonded interactions. This would leads 
to a huge number of possible final conformations under conventional molecular 
optimization methods based on the search for the minima of the energy hyper- 
surface. This number, which should increases with the number of the chain's 
degrees of freedom, however, is severely restricted during the real folding pro- 
cess, excluding relevant portions of the energy landscapes as far as an extended 
or random conformation is chosen as the initial state [T][2][3|^ll5|6|7f8|9l[T0] . On 
the other hand, if the extreme limit, were considered, where a polypeptide 
chain departs from its denatured state and in very relatively short period of 
time finds its unique native state after searching amongst the astronomical 
number of possible configurations, the simulating process for proteins with 
fifty to five hundred amino acids using approaches such as Monte Carlo and 
molecular dynamics, becomes impracticable, due to the very high computa- 
tion cost. Such contradictory dynamical picture is known as Levinthal paradox 

To investigate the role of stochasticity on the final native state, an inverse 
strategy is proposed, based on a simple angular 3D random-walk model to 
build up protein backbones with different lengths and distinct percentages of 
secondary structures. In the proposed model, each step has a fixed radial size 
Zo but dihedral $ and \I' angles of the protein backbone are chosen according to 
independent Gaussian probability distributions, following the suggestion given 
in reference [12]. The mean value and standard deviation of each defined ac- 
cording to the allowed regions of the <I>/\1/ plot of the frequency distribution of 
dihedral angles, the so called Ramachandran map. $ and \1/ mean values were 
used as proposed in the PRELUDE software package [13]. These values were 
computed from comparative statistics of the backbone secondary structure 
for several amino acid sequences. Table [1] indicates the seven possible pairs of 
($, dihedral angles and the associated structures of the main chain back- 
bone, as predicted by this method. These specific angles describe the average 
conformation of a wide range of proteins with known backbone structures. 
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To simulate structures with a definite percentage of secondary structures /, a 
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Table 1 

Seven possible pairs of dihedral angles and the associated conformations occurring 
in several amino acid sequences |13] . The a-helix pair is denoted by A while the 
/3-strands pair is denoted by B. 

characteristic number of steps n is fixed and the growth process within these 
n steps is divided in two stages: 

1) The first n x f steps are built accordingly to an angular Gaussian proba- 
bility distribution, whose mean value is one pair of angles as seen in Table[Il 
which in turn is associated with a given structure. 

2) The next n x (1 — /) steps are built according to an angular Gaussian 
probability distribution, whose mean value at each step is randomly chosen 
from amongst the seven pairs of angles of Table [TJ 

For the following n steps rules 1 and 2 are repeated upwards to construct a 
peptide chain with N amino acids. Therefore, in order to obtain an appropriate 
choice of the / percentage this stochastic procedure assures that the final 
peptide main chain follows the Ramachandran map. 

Within this simple model structures of the protein backbone were constructed 
considering only the dihedral angles. All other bonded or non-bonded inter- 
actions were not explicitly considered as well as excluded volume and steric 
effects, which are expected to be taken into account by the appropriated choice 
of the average values of the Gaussian probability distributions. For this reason 
it was possible to generate an elevated number of samples of possible protein 
conformations. An exhaustive number of simulations (~ 10^) were performed 
considering three basic cases: (a) / = 0.6 with a-helix structures; (b) / = 0.6 
with /3-strand structures and (c) the first n/2 steps built with / = 0.6 of a- 
helix structures and the next n/2 steps with / = 0.6 of /?-strand structures, 
consecutively. Therefore, in the a-case (a) 60% of the amino acids corresponds 
on average to a-helix structures, in the /3-case (b) 60% of the amino acids cor- 
responds to /3-strands, while in the mixed-case (c) the whole structure has an 
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average of 30% of a-helix and 30% of /3-strands. 10^ chains of the total size 
varying from = 125 to 400, with the number of steps n = 100 were gener- 
ated. For each case described above there was a variation of / in the interval 
[0, 1], step 0.1. There was also a variation of the standard deviation a of the 
Gaussian distribution within a wide range of values from to vr. 

Figure [1] shows the average radius of gyration (< Rg >) in function of the 
number of amino acids (N) for the three distinct choices of structures. From 
this plot however, a power-law behavior pattern can be observed indicating 
that these structures are self similar. The corresponding scaling exponent, 
which somehow describes the compactness of the structure, is calculated by the 
scaling relation: Rg ~ N", in all cases. The characteristic scaling exponents are 
u = 0.401 ± 0.002 for the a-helix case and u = 0.417 ± 0.002 for the /3-strands 
case, which falls in the interval to values of the real proteins. For the mixed 
case u = 0.409 ± 0.002 were achieved. To further analyze the compactness of 
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Fig. 1. The average radius of gyration in function of the number of residues 
obtained from the simulations with / = 0.60 for the a-helix structures (□), 
the mixed structures (A) and /3-strands structures (Q) with scaling exponent 
0.401 ± 0.002, 0.409 ± 0.002 and 0.417 ± 0.002, respectively, obtained from the 
relation < Rg >r^ N'^. Each point results from the average of 10^ simulations. The 
dashed lines indicate the linear fitting. The error bars are smaller than the symbol 
sizes. In all cases a/ir = 0.1 were fixed. 

structures based on the a-helix, the /?-strand and that composed of a mix of 
a-helices and /^-strands the scaling exponent z/ was estimated in function of 
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the percentage of secondary structure /. 

In the Figure [2] it can also observed that for the three studied cases the scahng 
exponent v growth is from v = 0.302±0.002 when the percentage of secondary 
structures is close to zero (/ ^ 0), i.e., a complete random structure, up to 
i^max = 0.520 ± 0.001 corresponding to full ordered structures. Slight different 
values was observed within the interval < / < 1 for the structures com- 
posed of a-helices, /5-strands and the mixed case, the one built with ct-helix 
motifs being more compacted than the other cases. It is worth to mention that 
the lower limit for the scaling exponent is 1/3, which would correspond to a 
fully compact three-dimensional structures commonly observed for globular 
proteins. However, the plots of Figure [3] shows that the scaling exponent lies 
below 1/3 up to ~ 0.3, for lower values of the fraction of secondary structure 
motif (0 < / < 0.3). Hence, such interval will corresponds to a high fraction 
(1 — /) of a random structure - built with dihedral angles chosen from Gaus- 
sian distributions centered at values randomly chosen at each step from the 
values given in Table I. Therefore, it is expected that the excluded volume 
and steric effects will not play their role and the model fails to reproduce the 
expected u = 1/3 limit behavior. 
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Fig. 2. The dependence of the scaling exponent ly in function of the percentage / 
of secondary structures, for a-helices (□), /3-strands (Q) and the mixed cases (A). 
The error bars are smaller than the symbol sizes. In all cases a/n = 0.1 was fixed. 
The dashed line indicates the experimental value ~ 0.405. 



At this point the behavior of the exponent v against changes in the Gaussian 
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Fig. 3. The dependence of the scahng exponent i/ in function of the variance a of 
the Gaussian probabihty distributions of the dihedral angles (in units of vr) , for the 
a-helix structure, considering the value of / = 0(0), / = 0.40 (v), / = 0.60 (□), 
/ = 0.80(0) ^^'^ f — 1-0 (a). The dashed line indicates the experimental value 
1/ ~ 0.405. 



probability distributions variance is explored. The increased dispersion of the 
(\1/, $) Gaussian distribution corresponds to the increase of randomness in the 
chain structure, destroying the role of the / percentage of secondary structures. 
Figure [3] illustrates the behavior of u in function of a/n of the a-helix case 
considering the value of / = 0.0, 0.40, 0.60, 0.80 and 1.0. For elevated variance 
values an increase of the u exponent approaching to 0.5 was observed for 
all values of /, which is associated to the lack of ordering. On the opposite 
limit, for vanishing variance values an increase of the u exponent was observed 
toward to one, which can be easily proved to correspond to the power law 
exponent of the linear chain backbone structure for the / = 1.0 case. However, 
between these two limits a minimum value of u is observed around cx/ti = 
0.15, independent of the value of / (7^ 1.0), corresponding to the maximum 
compacted structures. Therefore, when fitted with the u value extracted from 
data, as discussed below, the corresponding optimum value of / is found to 
be around / ~ 0.60. 

1826 different protein chains deposited in the Brookhaven Protein Data Bank 
(PDB) were also investigated in order to provide a comparison with the ob- 
tained simulation results. The number of amino acids was measured in function 
of the average radius. Figure H] depicts the main characteristics of all systems 
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discussed herein using geometric (dihedral angles) analysis. The figure indi- 
cates that several protein chains deposited in the PDB have a self-similar 
behavior pattern when the average radius (< R >) is plotted against the 
number of amino acids (A^). In this case the average radius signifies the av- 
erage distance from the geometric centre of all coordinates [lOj. It was also 
noticed that an average value was calculated for radii of chains with the same 
number of amino acids. This intrinsic characteristic of the protein structures 
must be responsible for explaining several aspects of these molecules such 
as the high compactness, which has been discussed in several other different 
contexts [TP|14lll5|16lll7lll8|19j . The u exponent associated with the average 

100 n 
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Fig. 4. The behavior of the average radius < R > against the number of amino 
acids N for a set of 1826 proteins chains. The scaling exponent v = 0.40 it 0.02. 

radius obtained is 0.404 ± 0.008, which is in agreement with recent similar 
study involving 200 proteins [20]. The volume and mass of proteins with more 
than fifty amino acids scale with respect to the average radius with exponents 
S = 2.47 ± 0.04 [ig and 5 = 2.47 ± 0.03 [HI respectively. This consequently 
corresponds to an exponent (z/ = 1/6 ~ 0.405) if we assume that the mass 
scales to the average radius with exponent one. This exponent would be asso- 
ciated to mixed chains composed of secondary structures, which according to 
the present simulations vary in an interval of 0.401 < ly < 0.417. Furthermore, 
the PDB proteins presents structures with ~ 60 % of secondary structures [2T|, 
which justifies and confirms our initial assumption of / = 0.6 for simulated 
structures shown in Figure [TJ It is worth mentioning that the present model 
can be generalized for growth chains with distinct percentages of different 
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structures, accordingly to the corresponding protein Ramachandran map. 

Through the approach presented herein the protein folding problem has been 
investigated assuming that proteins fold in a mixed manner following some di- 
rected process while being subject to certain stochastic ingredients. This signi- 
fies that the process is neither completely random, as raised by the Levinthal's 
paradox, nor it is entirely driven by the physical chemistry principles that es- 
tablish a definite folding pathway. The simple model presented herein focuses 
on the stochastic aspects of the formation of the secondary structures which is 
believed to be the earliest relevant precursor event in the folding, as confirmed 
by other recent experimental evidence [22]. According to rules 1 and 2 of the 
present model it can be assumed that the formation of a secondary structure 
(a-helix or /5-strands) occurs during a consecutive fraction / of steps governed 
by a Gaussian probability distribution, whose parameters (mean and standard 
deviation) are extracted from data associated with the Ramachandran map. 
These parameters, which caracterize a given structure, reflect the physical 
and chemical processes underlying protein stability. Therefore this fraction / 
of the chain somehow mimics the interplay between energy stability and en- 
tropy. Thus to the extend that the structure reaches a certain size it looses 
stability and folds randomly, changing the mean value of the Gaussian prob- 
ability distribution at each step, but still using the possible values extracted 
from data. 

What emerges from this stochastic process is that the narrow Gaussian prob- 
ability distribution of helical or stranded arrays do provides an insight into 
the protein folding process, which goes beyond the possibilities of molecu- 
lar structure or molecular dynamics analysis. Actually, this narrow Gaussian 
probability distribution supplies a peptide backbone chain with self-similar 
properties that matches with the one estimated from experimental data (see 
Figure H]). Furthermore, Figure [3] illustrated that if a probability distribution 
with wide values of the variance (large values of cr/vr) is considered approach- 
ing to an uniform probability distribution, the resulting final structure was 
less compacted than that obtained with the Gaussian distribution process. 
One further interesting result obtained by the present model is that back- 
bone chains with a-helix motifs are more compact than the /3-strands and 
the mixed result confirmed by current literature P|23f24j . The fractal 

dimension [6 = 1/v ~ 2.49) obtained from Figure [1] and Figure H] are com- 
parable with that obtained by the volume analysis against radius [15] or by 
mass-size exponent analysis {5 ~ 2.47) [T7|18|19||20] . 

The results of this study indicate that simulated structures are more compact 
when secondary portions (a-helices and/or /^-sheets) are present, than those 
built with other sets of dihedral angles, as shown in Table [H The method 
was systematically compared to other widely used methods of protein folding 
analysis [T|[2][51I71I3] . Several of these methods do not result in a fully consistent 
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assignment of self-similarity of protein structures. It should be mentioned the 
recent work of Huang p5] where a sophisticated conditioned self-avoid walk 
model was proposed taking into account the hydrophobic effect and the hydro- 
gen bonding focusing on the physical chemistry mechanisms underlying the 
protein folding process. Independent of the details of the underlying physical 
chemistry mechanisms, building protein backbones with the method proposed 
in the present work suggests that these structures are driven by narrow Gaus- 
sian distributions. Thus it is the general conclusion of this work that protein 
folds like an angular Gaussian-walk and not as a random-walk problem. 
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